12 research outputs found

    An original framework for understanding human actions and body language by using deep neural networks

    Get PDF
    The evolution of both fields of Computer Vision (CV) and Artificial Neural Networks (ANNs) has allowed the development of efficient automatic systems for the analysis of people's behaviour. By studying hand movements it is possible to recognize gestures, often used by people to communicate information in a non-verbal way. These gestures can also be used to control or interact with devices without physically touching them. In particular, sign language and semaphoric hand gestures are the two foremost areas of interest due to their importance in Human-Human Communication (HHC) and Human-Computer Interaction (HCI), respectively. While the processing of body movements play a key role in the action recognition and affective computing fields. The former is essential to understand how people act in an environment, while the latter tries to interpret people's emotions based on their poses and movements; both are essential tasks in many computer vision applications, including event recognition, and video surveillance. In this Ph.D. thesis, an original framework for understanding Actions and body language is presented. The framework is composed of three main modules: in the first one, a Long Short Term Memory Recurrent Neural Networks (LSTM-RNNs) based method for the Recognition of Sign Language and Semaphoric Hand Gestures is proposed; the second module presents a solution based on 2D skeleton and two-branch stacked LSTM-RNNs for action recognition in video sequences; finally, in the last module, a solution for basic non-acted emotion recognition by using 3D skeleton and Deep Neural Networks (DNNs) is provided. The performances of RNN-LSTMs are explored in depth, due to their ability to model the long term contextual information of temporal sequences, making them suitable for analysing body movements. All the modules were tested by using challenging datasets, well known in the state of the art, showing remarkable results compared to the current literature methods

    Adaptive bootstrapping management by keypoint clustering for background initialization

    No full text
    The availability of a background model that describes the scene is a prerequisite for many computer vision applications. In several situations, the model cannot be easily generated when the background contains some foreground objects (i.e., bootstrapping problem). In this letter, an Adaptive Bootstrapping Management (ABM) method, based on keypoint clustering, is proposed to model the background on video sequences acquired by mobile and static cameras. First, keypoints are detected on each frame by the A-KAZE feature extractor, then Density-Based Spatial Clustering of Application with Noise (DBSCAN) is used to find keypoint clusters. These clusters represent the candidate regions of foreground elements inside the scene. The ABM method manages the scene changes generated by foreground elements, both in the background model initialization, managing the bootstrapping problem, and in the background model updating. Moreover, it achieves good results with both mobile and static cameras and it requires a small number of frames to initialize the background model

    Online separation of handwriting from freehand drawing using extreme learning machines

    No full text
    Online separation between handwriting and freehand drawing is still an active research area in the field of sketch-based interfaces. In the last years, most approaches in this area have been focused on the use of statistical separation methods, which have achieved significant results in terms of performance. More recently, Machine Learning (ML) techniques have proven to be even more effective by treating the separation problem like a classification task. Despite this, also in the use of these techniques several aspects can be still considered open problems, including: 1) the trade-off between separation performance and training time; 2) the separation of handwriting from different types of freehand drawings. To address the just reported drawbacks, in this paper a novel separation algorithm based on a set of original features and an Extreme Learning Machine (ELM) is proposed. Extensive experiments on a wide range of sketched schemes (i.e., text and graphical symbols), more numerous than those usually tested in any key work of the current literature, have highlighted the effectiveness of the proposed approach. Finally, measurements on accuracy and speed of computation, during both training and testing stages, have shown that the ELM can be considered, in this research area, the better choice even if compared with other popular ML techniques

    A keypoint-based method for background modeling and foreground detection using a PTZ camera

    No full text
    The automatic scene analysis is still a topic of great interest in computer vision due to the growing possibilities provided by the increasingly sophisticated optical cameras. The background modeling, including its initialization and its updating, is a crucial aspect that can play a main role in a wide range of application domains, such as vehicle tracking, person re-identification and object recognition. In any case, many challenges still remain partially unsolved, including camera movements (i.e., pan/tilt), scale changes (i.e., zoom-in/zoom-out) and deletion of the initial foreground elements from the background model. This paper describes a method for background modeling and foreground detection able to address all the mentioned challenges. In particular, the proposed method uses a spatio-temporal tracking of sets of keypoints to distinguish the background from the foreground. It analyses these sets by a grid strategy to estimate both camera movements and scale changes. The same sets are also used to construct a panoramic background model and to delete the possible initial foreground elements from it. Experiments carried out on some challenging videos from three different datasets (i.e., PBI, VOT and Airport MotionSeg) demonstrate the effectiveness of the method on PTZ cameras. Other videos from a further dataset (i.e., FBMS) have been used to measure the accuracy of the proposed method with respect to some key works of the current state-of-the-art. Finally, some videos from another dataset (i.e., SBI) have been used to test the method on stationary cameras

    A Multipurpose Autonomous Robot for Target Recognition in Unknown Environments

    No full text
    In recent years, the technological improvements of consumer robots, in terms of processing capacity and sensors, are enabling an ever-increasing number of researchers to quickly develop both scale prototypes and alternative low cost solutions. In these contexts, a critical aspect is the design of ad-hoc algorithms according to the features of the available hardware. This paper proposes a prototype of an autonomous robot for mapping unknown environments and recognizing target objects. During the setup phase one or more target objects are shown to the RGB camera of the robot which, for each of them, extracts and stores a set of A-KAZE features. Afterwards, the robot adopts the ultrasonic distance measurement and the RGB stream to map the whole environment and search a set of A-KAZE features matchable with those previously acquired. The paper also reports both preliminary tests carried out on a reference indoor environment and a case study performed in an outdoor one that validate the proposed system

    A new descriptor for Keypoint-Based background modeling

    No full text
    Background modeling is a preliminary task for many computer vision applications, describing static elements of a scene and isolating foreground ones. Defining a robust background model of uncontrolled environments is a current challenge since the model must manage many issues, e.g., moving cameras, dynamic background, bootstrapping, shadows, and illumination changes. Recently, methods based on keypoint clustering have shown remarkable robustness especially in bootstrapping and camera movements, highlighting however limitations in the analysis of dynamic background (i.e., trees blowing in the wind or gushing fountains). In this paper, an innovative combination between the RootSIFT descriptor and an average pooling is proposed in a keypoint clustering method for real-time background modeling and foreground detection. Compared to renowned descriptors, such as A-KAZE, this combination is invariant to small local changes in the scene, thus resulting more robust in dynamic background cases. Results, obtained on experiments carried out on two benchmark datasets, demonstrate how the proposed solution improves the previous keypoint-based models and overcomes several works of the current state-of-the-art
    corecore